My ML Project

Authors
Affiliation

Name I, First Name I

Name of the University

Name II, First Name II

Published

April 29, 2024

Abstract

The following machine learning project focuses on…

[1] "C:/Users/UrsHu/Pillars/Learn/Academic/Master/Semestre 2/machine learning/Project/Machine_Learning"

1 Introduction

  • Overview and Motivation
  • Related Work
  • Research questions

2 TESTING if R works and if Python works

#> [1] "hello"
#> 30.0

3 Data

  • Sources
  • Description
  • Wrangling/cleaning
  • Spotting mistakes and missing data (could be part of EDA too)
  • Listing anomalies and outliers (could be part of EDA too)

3.1 Loading and small cleaning (not complete for now)

#> [1] "C:/Users/UrsHu/Pillars/Learn/Academic/Master/Semestre 2/machine learning/Project/Machine_Learning/docs"
#>     price number_of_rooms            address canton property_type
#> 1 1800000              65 1844 Villeneuve VD   Vaud     Apartment
#> 2 1980000              55      1820 Montreux   Vaud     Apartment
#> 3  488000              35         1882 Gryon   Vaud     Apartment
#> 4 1755000               7      1820 Montreux   Vaud     Apartment
#> 5  650000              25       1815 Clarens   Vaud     Apartment
#> 6 1490000              45          1260 Nyon   Vaud     Apartment
#>   floor year_category
#> 1    eg        0-1919
#> 2    eg        0-1919
#> 3    eg        0-1919
#> 4    eg        0-1919
#> 5    eg        0-1919
#> 6    eg        0-1919

3.2 Creating Variable zip_code and merging with AMTOVZ_CSV_LV95

#>     price number_of_rooms            address canton property_type
#> 1 1800000              65 1844 Villeneuve VD   Vaud     Apartment
#> 2 1980000              55      1820 Montreux   Vaud     Apartment
#> 3  488000              35         1882 Gryon   Vaud     Apartment
#> 4 1755000               7      1820 Montreux   Vaud     Apartment
#> 5  650000              25       1815 Clarens   Vaud     Apartment
#> 6 1490000              45          1260 Nyon   Vaud     Apartment
#>   floor year_category
#> 1    eg        0-1919
#> 2    eg        0-1919
#> 3    eg        0-1919
#> 4    eg        0-1919
#> 5    eg        0-1919
#> 6    eg        0-1919
#>     price number_of_rooms            address canton property_type
#> 1 1800000              65 1844 Villeneuve VD   Vaud     Apartment
#> 2 1980000              55      1820 Montreux   Vaud     Apartment
#> 3  488000              35         1882 Gryon   Vaud     Apartment
#> 4 1755000               7      1820 Montreux   Vaud     Apartment
#> 5  650000              25       1815 Clarens   Vaud     Apartment
#> 6 1490000              45          1260 Nyon   Vaud     Apartment
#>   floor year_category zip_code
#> 1    eg        0-1919     1844
#> 2    eg        0-1919     1820
#> 3    eg        0-1919     1882
#> 4    eg        0-1919     1820
#> 5    eg        0-1919     1815
#> 6    eg        0-1919     1260
#>       Ortschaftsname  PLZ Zusatzziffer       Gemeindename BFS.Nr
#> 1    Aeugst am Albis 8914            0    Aeugst am Albis      1
#> 2        Aeugstertal 8914            2    Aeugst am Albis      1
#> 3          Zwillikon 8909            0 Affoltern am Albis      2
#> 4 Affoltern am Albis 8910            0 Affoltern am Albis      2
#> 5         Bonstetten 8906            0         Bonstetten      3
#> 6          Sihlbrugg 6340            4    Hausen am Albis      4
#>   Kantonskürzel       E       N Sprache   Validity
#> 1            ZH 2679403 1235842      de 2008-07-01
#> 2            ZH 2679815 1237404      de 2008-07-01
#> 3            ZH 2675280 1238108      de 2008-07-01
#> 4            ZH 2676852 1236930      de 2008-07-01
#> 5            ZH 2677412 1241078      de 2008-07-01
#> 6            ZH 2686082 1230649      de 2008-07-01
#>                 City zip_code Canton_code
#> 1    Aeugst am Albis     8914          ZH
#> 2        Aeugstertal     8914          ZH
#> 3          Zwillikon     8909          ZH
#> 4 Affoltern am Albis     8910          ZH
#> 5         Bonstetten     8906          ZH
#> 6          Sihlbrugg     6340          ZH
#>       zip_code    price number_of_rooms
#> 1           25  2200000              10
#> 2           25  2200000              65
#> 3           26  1995000              75
#> 4           26   870490              45
#> 5          322   870000              25
#> 6          322  1295770              45
#> 2253      1200  2450000               6
#> 2254      1200   982130              45
#> 11886     1919  2535730              55
#> 11887     1919   230000              15
#> 11888     1919  1415380              35
#> 11889     1919  1043260              45
#> 11890     1919  2535730              55
#> 17993     2500  1050000              45
#> 17994     2500  1100000               5
#> 17995     2500   887500              55
#> 17996     2500   870500              45
#> 17997     2500  1176820              45
#> 17998     2500  1159550              35
#> 17999     2500  1927050              45
#> 18000     2500   892500              45
#> 18001     2500   887500              45
#> 18002     2500   420000              45
#> 18003     2500   877500              45
#> 18004     2500   885500              55
#> 18005     2500   872500              45
#> 19603     3000  1448610              45
#> 19604     3000  1515060              45
#> 19605     3000   956880              45
#> 19606     3000  1222680              35
#> 19607     3000  1448610              45
#> 19608     3000  1448610              45
#> 19609     3000  1515060              45
#> 19610     3000   820000              55
#> 19611     3000  1222680              35
#> 19612     3000  1590000              55
#> 19613     3000  1448610              45
#> 27169     4000  2100000              65
#> 27170     4000   975000              45
#> 30708     5201   963520              45
#> 33490     6511   584760               3
#> 33927     6547 19935000              55
#> 35207     6602   270000              15
#> 35208     6602  3721200              55
#> 35209     6602  3721200              55
#> 35210     6604  2644710              35
#> 35211     6604  2644710              35
#> 35212     6604  1142940              45
#> 35213     6604   610000              25
#> 35214     6604   810690              35
#> 35215     6604   860000              35
#> 35216     6604   917010              45
#> 35217     6604  1010040              45
#> 40817     6901  3628170              45
#> 40818     6911   877140              55
#> 40819     6911   810690              45
#> 40820     6911   730950              45
#> 40821     6911   465150              35
#> 42848     7133  2246010              35
#> 42861     7135  3575010              65
#> 43231     8000  1295770              35
#> 43232     8000  2100000              45
#> 43233     8000  2495000              55
#> 44144     8238   739000              35
#> 44145     8238   739000              35
#> 44146     8238   716000              35
#> 44147     8238   716000              35
#> 44148     8238   325600               3
#> 44889     8423  2910510              45
#> 44890     8423  2804190              55
#> 47001     9002  3787650              45
#> 47621     9241   724300              35
#>                                                  address
#> 1                                       1000 Lausanne 25
#> 2                                       1000 Lausanne 25
#> 3                          Lausanne 26, 1000 Lausanne 26
#> 4                                       1000 Lausanne 26
#> 5                    Via Cuolm Liung 30d, 7032 Laax GR 2
#> 6                       Via Murschetg 29, 7032 Laax GR 2
#> 2253                                         1200 Genève
#> 2254  Chemin des pralets, 74100 Etrembières, 1200 Genève
#> 11886                                      1919 Martigny
#> 11887                                      1919 Martigny
#> 11888                                      1919 Martigny
#> 11889                                      1919 Martigny
#> 11890                                      1919 Martigny
#> 17993                    Hohlenweg 11b, 2500 Biel/Bienne
#> 17994                                   2500 Biel/Bienne
#> 17995                                   2500 Biel/Bienne
#> 17996                                   2500 Biel/Bienne
#> 17997                                   2500 Biel/Bienne
#> 17998                                   2500 Biel/Bienne
#> 17999                                        2500 Bienne
#> 18000                                   2500 Biel/Bienne
#> 18001                                   2500 Biel/Bienne
#> 18002                                   2500 Biel/Bienne
#> 18003                                   2500 Biel/Bienne
#> 18004                                   2500 Biel/Bienne
#> 18005                                   2500 Biel/Bienne
#> 19603                                          3000 Bern
#> 19604                                          3000 Bern
#> 19605                                          3000 Bern
#> 19606                                          3000 Bern
#> 19607                                          3000 Bern
#> 19608                                          3000 Bern
#> 19609                                          3000 Bern
#> 19610                                          3000 Bern
#> 19611                                          3000 Bern
#> 19612                                          3000 Bern
#> 19613                                          3000 Bern
#> 27169                                         4000 Basel
#> 27170                                         4000 Basel
#> 30708                                      5201 Brugg AG
#> 33490                                     6511 Cadenazzo
#> 33927                               Augio 1F, 6547 Augio
#> 35207                                       6602 Muralto
#> 35208                                       6602 Muralto
#> 35209                                       6602 Muralto
#> 35210                                       6604 Solduno
#> 35211                                       6604 Solduno
#> 35212                                       6604 Solduno
#> 35213                                       6604 Solduno
#> 35214                                       6604 Solduno
#> 35215                                       6604 Solduno
#> 35216                                       6604 Locarno
#> 35217                                       6604 Locarno
#> 40817                                        6901 Lugano
#> 40818                             6911 Campione d'Italia
#> 40819                             6911 Campione d'Italia
#> 40820                             6911 Campione d'Italia
#> 40821                             6911 Campione d'Italia
#> 42848                  Inder Platenga 34, 7133 Obersaxen
#> 42861                                       7135 Fideris
#> 43231                                        8000 Zürich
#> 43232                                        8000 Zürich
#> 43233                                        8000 Zürich
#> 44144                         8238 Büsingen am Hochrhein
#> 44145                         8238 Büsingen am Hochrhein
#> 44146       Junkerstrasse 85, 8238 Büsingen am Hochrhein
#> 44147       Junkerstrasse 85, 8238 Büsingen am Hochrhein
#> 44148      Stemmerstrasse 14, 8238 Büsingen am Hochrhein
#> 44889                      Chüngstrasse 48, 8423 Embrach
#> 44890                      Chüngstrasse 60, 8423 Embrach
#> 47001                     6900 Lugano 2 Paradiso Caselle
#> 47621                                       9241 Kradolf
#>             canton    property_type floor year_category City
#> 1             Vaud     Single house           1919-1945 <NA>
#> 2             Vaud            Villa           2006-2010 <NA>
#> 3             Vaud            Villa           1961-1970 <NA>
#> 4             Vaud        Apartment noteg     2016-2024 <NA>
#> 5          Grisons        Apartment    eg     2016-2024 <NA>
#> 6          Grisons        Apartment noteg     2011-2015 <NA>
#> 2253        Geneva Bifamiliar house           1981-1990 <NA>
#> 2254        Geneva Bifamiliar house           2016-2024 <NA>
#> 11886       Valais       Attic flat noteg     2016-2024 <NA>
#> 11887       Valais        Apartment    eg     2016-2024 <NA>
#> 11888       Valais        Apartment noteg     2016-2024 <NA>
#> 11889       Valais        Apartment noteg     2016-2024 <NA>
#> 11890       Valais        Apartment noteg     2016-2024 <NA>
#> 17993         Bern     Single house           2001-2005 <NA>
#> 17994         Bern     Single house           2001-2005 <NA>
#> 17995         Bern     Single house           2016-2024 <NA>
#> 17996         Bern     Single house           2016-2024 <NA>
#> 17997         Bern            Villa           2016-2024 <NA>
#> 17998         Bern            Villa           2016-2024 <NA>
#> 17999         Bern     Single house           2016-2024 <NA>
#> 18000         Bern     Single house           2016-2024 <NA>
#> 18001         Bern     Single house           2016-2024 <NA>
#> 18002         Bern        Apartment noteg     1971-1980 <NA>
#> 18003         Bern     Single house           2016-2024 <NA>
#> 18004         Bern     Single house           2016-2024 <NA>
#> 18005         Bern     Single house           2016-2024 <NA>
#> 19603         Bern        Apartment    eg     2016-2024 <NA>
#> 19604         Bern        Apartment    eg     2016-2024 <NA>
#> 19605         Bern        Apartment    eg     2016-2024 <NA>
#> 19606         Bern        Apartment noteg     2016-2024 <NA>
#> 19607         Bern        Apartment noteg     2016-2024 <NA>
#> 19608         Bern        Apartment    eg     2016-2024 <NA>
#> 19609         Bern        Apartment    eg     2016-2024 <NA>
#> 19610         Bern        Apartment noteg     2016-2024 <NA>
#> 19611         Bern           Duplex noteg     2016-2024 <NA>
#> 19612         Bern        Apartment noteg     1991-2000 <NA>
#> 19613         Bern        Roof flat noteg     2016-2024 <NA>
#> 27169  Basel-Stadt            Villa           2016-2024 <NA>
#> 27170  Basel-Stadt     Single house           2016-2024 <NA>
#> 30708       Aargau        Apartment noteg     2016-2024 <NA>
#> 33490       Ticino        Apartment noteg     2016-2024 <NA>
#> 33927      Grisons     Single house           2016-2024 <NA>
#> 35207       Ticino        Apartment    eg     1961-1970 <NA>
#> 35208       Ticino     Single house           1981-1990 <NA>
#> 35209       Ticino     Single house           1981-1990 <NA>
#> 35210       Ticino       Attic flat noteg     2011-2015 <NA>
#> 35211       Ticino        Apartment noteg     2011-2015 <NA>
#> 35212       Ticino        Apartment noteg     2016-2024 <NA>
#> 35213       Ticino        Apartment noteg     2016-2024 <NA>
#> 35214       Ticino        Apartment noteg     2016-2024 <NA>
#> 35215       Ticino        Apartment noteg     2016-2024 <NA>
#> 35216       Ticino        Apartment noteg     2011-2015 <NA>
#> 35217       Ticino        Apartment noteg     2011-2015 <NA>
#> 40817       Ticino       Attic flat noteg     2011-2015 <NA>
#> 40818       Ticino     Single house           1971-1980 <NA>
#> 40819       Ticino        Apartment    eg     1946-1960 <NA>
#> 40820       Ticino        Apartment noteg     1991-2000 <NA>
#> 40821       Ticino        Apartment noteg     1946-1960 <NA>
#> 42848      Grisons     Single house           2006-2010 <NA>
#> 42861      Grisons     Single house              0-1919 <NA>
#> 43231       Zurich     Single house           2016-2024 <NA>
#> 43232       Zurich        Apartment noteg     2016-2024 <NA>
#> 43233       Zurich        Apartment noteg        0-1919 <NA>
#> 44144 Schaffhausen        Apartment    eg     2016-2024 <NA>
#> 44145 Schaffhausen       Attic flat    eg     2016-2024 <NA>
#> 44146 Schaffhausen       Attic flat noteg     2016-2024 <NA>
#> 44147 Schaffhausen        Apartment noteg     2016-2024 <NA>
#> 44148 Schaffhausen        Apartment noteg     1961-1970 <NA>
#> 44889       Zurich     Single house           2016-2024 <NA>
#> 44890       Zurich Bifamiliar house           2016-2024 <NA>
#> 47001       Ticino        Apartment noteg     2011-2015 <NA>
#> 47621      Thurgau        Apartment noteg     1991-2000 <NA>
#>       Canton_code
#> 1            <NA>
#> 2            <NA>
#> 3            <NA>
#> 4            <NA>
#> 5            <NA>
#> 6            <NA>
#> 2253         <NA>
#> 2254         <NA>
#> 11886        <NA>
#> 11887        <NA>
#> 11888        <NA>
#> 11889        <NA>
#> 11890        <NA>
#> 17993        <NA>
#> 17994        <NA>
#> 17995        <NA>
#> 17996        <NA>
#> 17997        <NA>
#> 17998        <NA>
#> 17999        <NA>
#> 18000        <NA>
#> 18001        <NA>
#> 18002        <NA>
#> 18003        <NA>
#> 18004        <NA>
#> 18005        <NA>
#> 19603        <NA>
#> 19604        <NA>
#> 19605        <NA>
#> 19606        <NA>
#> 19607        <NA>
#> 19608        <NA>
#> 19609        <NA>
#> 19610        <NA>
#> 19611        <NA>
#> 19612        <NA>
#> 19613        <NA>
#> 27169        <NA>
#> 27170        <NA>
#> 30708        <NA>
#> 33490        <NA>
#> 33927        <NA>
#> 35207        <NA>
#> 35208        <NA>
#> 35209        <NA>
#> 35210        <NA>
#> 35211        <NA>
#> 35212        <NA>
#> 35213        <NA>
#> 35214        <NA>
#> 35215        <NA>
#> 35216        <NA>
#> 35217        <NA>
#> 40817        <NA>
#> 40818        <NA>
#> 40819        <NA>
#> 40820        <NA>
#> 40821        <NA>
#> 42848        <NA>
#> 42861        <NA>
#> 43231        <NA>
#> 43232        <NA>
#> 43233        <NA>
#> 44144        <NA>
#> 44145        <NA>
#> 44146        <NA>
#> 44147        <NA>
#> 44148        <NA>
#> 44889        <NA>
#> 44890        <NA>
#> 47001        <NA>
#> 47621        <NA>

We have 144 NAN, where

  • The zip code was not found in the atmo df
  • The zip code was incorectly isolated from the address

Removed them ::: {.cell layout-align=“center”}

:::

##Impots ::: {.cell layout-align=“center”}

:::

##properties and impots merged ::: {.cell layout-align=“center”}

::: ##dataset used for the rest of the analysis ::: {.cell layout-align=“center”}

:::

3.2.1 Cleaning of commune data

4 EDA

4.1 Change the path below

4.2 Histogram of prices

4.3 Histogram of prices for each property type

note : only price between 0 and 500000 so some outliers aren’t here

4.4 Histogram of prices for each year category

note : only price between 0 and 500000 so some outliers aren’t here

4.5 Histogram of prices for each canton

note : only price between 0 and 500000 so some outliers aren’t here

4.6 Histogram of prices for each number of rooms

note : only price between 0 and 500000 so some outliers aren’t here

and the graph below only show apartments with less than 10 rooms (but you can change the code if needed

4.7 Test Regression

#> 
#> Call:
#> lm(formula = price ~ number_of_rooms + canton + property_type + 
#>     year_category, data = properties)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -7013788  -514438  -138948   264464 21628996 
#> 
#> Coefficients:
#>                               Estimate Std. Error t value Pr(>|t|)
#> (Intercept)                    -677158      55739  -12.15  < 2e-16
#> number_of_rooms                 337946       6166   54.81  < 2e-16
#> cantonappenzell-ausser-rhoden  -464945     126861   -3.66  0.00025
#> cantonappenzell-inner-rhoden   -874289     392590   -2.23  0.02596
#> cantonbasel-landschaft         -195701      57943   -3.38  0.00073
#> cantonbasel-stadt               218682     105130    2.08  0.03753
#> cantonbern                     -478376      46221  -10.35  < 2e-16
#> cantonfribourg                 -781416      48366  -16.16  < 2e-16
#> cantongeneva                   2025260      62234   32.54  < 2e-16
#> cantonglarus                   -573694     173301   -3.31  0.00093
#> cantongrisons                    59982      71666    0.84  0.40262
#> cantonjura                     -801519      77323  -10.37  < 2e-16
#> cantonlucerne                  -187979      73261   -2.57  0.01030
#> cantonneuchatel                -353635      65590   -5.39  7.1e-08
#> cantonnidwalden                 991055     244826    4.05  5.2e-05
#> cantonobwalden                  366062     244712    1.50  0.13470
#> cantonschaffhausen             -584997     120601   -4.85  1.2e-06
#> cantonschwyz                     18070     132558    0.14  0.89157
#> cantonsolothurn                -784557      61024  -12.86  < 2e-16
#> cantonst-gallen                -404890      55918   -7.24  4.6e-13
#> cantonthurgau                   -37337      63444   -0.59  0.55620
#> cantonticino                    125913      38499    3.27  0.00108
#> cantonuri                         9578     155772    0.06  0.95097
#> cantonvalais                   -219964      39781   -5.53  3.3e-08
#> cantonvaud                       89914      40258    2.23  0.02553
#> cantonzug                       801241     153896    5.21  1.9e-07
#> cantonzurich                    316099      49688    6.36  2.0e-10
#> property_typeAttic flat         311019      45964    6.77  1.4e-11
#> property_typeBifamiliar house    41841      42939    0.97  0.32986
#> property_typeChalet            1136804      56690   20.05  < 2e-16
#> property_typeDuplex              -5091      56699   -0.09  0.92846
#> property_typeFarm house         237939     118848    2.00  0.04529
#> property_typeLoft               285442     291977    0.98  0.32827
#> property_typeRoof flat            4801      64587    0.07  0.94074
#> property_typeRustic house      -281265     249068   -1.13  0.25880
#> property_typeSingle house       389066      24252   16.04  < 2e-16
#> property_typeTerrace flat        88662      87071    1.02  0.30856
#> property_typeVilla             1278283      38187   33.47  < 2e-16
#> year_category1919-1945           10462      61602    0.17  0.86515
#> year_category1946-1960           76025      57261    1.33  0.18429
#> year_category1961-1970          232055      48444    4.79  1.7e-06
#> year_category1971-1980          210609      43422    4.85  1.2e-06
#> year_category1981-1990          237789      43679    5.44  5.3e-08
#> year_category1991-2000          477554      45385   10.52  < 2e-16
#> year_category2001-2005          519338      55369    9.38  < 2e-16
#> year_category2006-2010          591351      48030   12.31  < 2e-16
#> year_category2011-2015          724194      47219   15.34  < 2e-16
#> year_category2016-2024          641233      36926   17.37  < 2e-16
#>                                  
#> (Intercept)                   ***
#> number_of_rooms               ***
#> cantonappenzell-ausser-rhoden ***
#> cantonappenzell-inner-rhoden  *  
#> cantonbasel-landschaft        ***
#> cantonbasel-stadt             *  
#> cantonbern                    ***
#> cantonfribourg                ***
#> cantongeneva                  ***
#> cantonglarus                  ***
#> cantongrisons                    
#> cantonjura                    ***
#> cantonlucerne                 *  
#> cantonneuchatel               ***
#> cantonnidwalden               ***
#> cantonobwalden                   
#> cantonschaffhausen            ***
#> cantonschwyz                     
#> cantonsolothurn               ***
#> cantonst-gallen               ***
#> cantonthurgau                    
#> cantonticino                  ** 
#> cantonuri                        
#> cantonvalais                  ***
#> cantonvaud                    *  
#> cantonzug                     ***
#> cantonzurich                  ***
#> property_typeAttic flat       ***
#> property_typeBifamiliar house    
#> property_typeChalet           ***
#> property_typeDuplex              
#> property_typeFarm house       *  
#> property_typeLoft                
#> property_typeRoof flat           
#> property_typeRustic house        
#> property_typeSingle house     ***
#> property_typeTerrace flat        
#> property_typeVilla            ***
#> year_category1919-1945           
#> year_category1946-1960           
#> year_category1961-1970        ***
#> year_category1971-1980        ***
#> year_category1981-1990        ***
#> year_category1991-2000        ***
#> year_category2001-2005        ***
#> year_category2006-2010        ***
#> year_category2011-2015        ***
#> year_category2016-2024        ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1240000 on 21363 degrees of freedom
#>   (72 observations deleted due to missingness)
#> Multiple R-squared:  0.323,  Adjusted R-squared:  0.321 
#> F-statistic:  216 on 47 and 21363 DF,  p-value: <2e-16

5 Supervised learning

  • Data splitting (if a training/test set split is enough for the global analysis, at least one CV or bootstrap must be used)
  • Two or more models
  • Two or more scores
  • Tuning of one or more hyperparameters per model
  • Interpretation of the model(s)

6 Unsupervised learning

  • Clustering and/or dimension reduction

7 Conclusion

  • Brief summary of the project
  • Take home message
  • Limitations
  • Future work?